A Cooperative Stochastic Model of Gene Expression 
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Recent experiments at the level of a single cell have shown that gene expression occurs in abrupt 
stochastic bursts. Further, in an ensemble of cells, the levels of proteins produced have a bimodal 
distribution. In a large fraction of cells, the gene expression is either off or has a high value. We 
propose a stochastic model of gene expression the essential features of which are stochasticity and 
cooperative binding of RNA polymerase. The model can reproduce the bimodal behaviour seen in 
experiments. 
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Gene expression is a fundamental and important bio- 
logical process in a cell. Genes are part of DNA molecules 
and determine the structure of functional molecules such 
as RNAs and proteins. In each cell, at any instant of 
time, only a subset of genes present is active in direct- 
ing RNA/protcin synthesis. The gene expression is 'on' 
in such a case. The information present in the gene is 
expressed in the following manner. In the first step of 
gene expression, the sequence along one of the strands 
of the DNA molecule is copied or transcribed in a RNA 
molecule (mRNA) . The sequence of mRNA molecules is 
then translated into the sequence of amino acids, which 
in turn determines the functional nature of the protein 
molecule produced. The rate and temporal sequence of 
gene expression is responsible for many aspects of biol- 
ogy. In the large majority of cases, the regulation of gene 
expression occurs at the level of transcription and hence 
an in-depth understanding of transcription regulation is 
a central focus of biology 0] . 

Recent experiments (see Appendix A), provide evi- 
dence that gene expression occurs in abrupt stochastic 
bursts at the level of an individual cell (^-^. Also, in 
many cases, in a population of cells the levels of proteins 
produced are distributed in a bimodal manner implying 
that in a large fraction of cells the gene expression is ei- 
ther off or has a high value Q. In this paper, we propose 
a stochastic model of gene expression which provides a 
possible explanation of the observed bimodal behaviour. 

Genes are transcribed into mRNA by an enzyme called 
RNA polymerase (RNAP). The process is initiated with 
the binding of RNAP to a site called promoter, usually 
near the beginning of the transcribed sequence. After the 
initial binding and subsequent conformational changes, 
the enzyme begins synthesis of the RNA chain and grad- 
ually translates along the DNA. The initial binding of 
RNAP to a promoter can be prevented by the binding 
of a regulatory protein (R) to an overlapping segment of 
DNA (called operator) resulting in a turning off of mRNA 
production. There is a finite probability that the bound 
R molecule dissociates from the operator at any instant 



of time. RNAP molecule then has a certain probability 
of binding to the promoter and initiating transcription. 
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FIG. 1. Concentration of mRNA molecules [mRNA] in 
arbitrary units as a function of time t. The parameter values 
are pi = 0.5, p2 = 0.5, ps = 0.3, p4 = 0.85, ps =0.05 and 

Each of the possibilities described above actually in- 
volves a series of physico-chemical processes, a detailed 
characterization of which is not required for the model 
of gene expression that we propose here. We represent 
a gene by a one-dimensional lattice of n+2 sites. The 
first two sites represent the operator and promoter re- 
spectively. The lattice is a coarse-grained description of 
an actual gene. In reality the operator and promoter re- 
gions may extend over a certain number of base pairs in 
the DNA and they can be overlapping or not. In our 
model they are represented as single sites. Each of the 
other sites in the lattice represents a finite number of 
base-pairs in the DNA molecule. 

The different physico-chemical processes are lumped 
together into a few simple events which are random in 
nature. This lumping together avoids unnecessary com- 
plexity that has no bearing on the basic nature of the 
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process. The operator (O) and promoter (P) together 
can be in four possible configurations : 10, 01, 00 and 11. 
The numbers '1' and '0' stand for 'occupied' and 'unoc- 
cupied'. The configuration ij describes the occupation 
status of O (i) and P (j). For example, the configuration 
10 corresponds to O being occupied by a R molecule and 
P being unoccupied. Similarly, in the configuration 01, 
O is unoccupied and P is occupied by a RNAP molecule. 
Binding of R and RNAP molecules are mutually exclusive 
so that the configuration 11 is strictly prohibited. Given 
a 00 configuration at time t, the transition probabilities 
to configurations 10 and 01 at time t + 1 are pi and p2 
respectively. The probability of remaining in the configu- 
ration 00 is 1 —pi —p2 ■ A 10 configuration at time t goes 
to a 00 configuration at time t+1 with probability and 
remains unchanged with probability 1 — .We have as- 
sumed all the probabilities to be time-independent. The 
justification for this approximation is that the number 
of free R and RNAP molecules in the cell are typically 
one or two orders of magnitude higher than the number 
of DNA sites they occupy. The RNAP molecule once 
bound to the promoter initiates transcription in the next 
time step, i.e., the 01 configuration makes a transition 
to a 00 configuration with probability 1. The motion of 
RNAP is in the forward direction and the molecule covers 
a unit distance (the distance between two successive lat- 
tice sites) in each time step. Once the molecule reaches 
the last site of the lattice the transcription ends and a 
mRNA is synthesized. 

The second major feature of our model is the coopera- 
tive binding of RNAP to the promoter, when an adjacent 
RNAP molecule is present. This implies that there is a 
higher probability of binding of RNAP to the promoter 
in one time step if another RNAP molecule is present 
at the site next to the promoter. In our model, the prob- 
ability of cooperative binding of RNAP is P4 which is 
larger than p2. The probabilities pi and I — Pi — P2 are 
changed to new values p^ and 1 — p4 — p^ respectively. 
Degradation of mRNA is taken into account by assuming 
the decay rate to be given by /iN where N is the num- 
ber of mRNAs present at time t. The number of mR- 
NAs produced as a function of time is studied by Monte 
Carlo simulation. For the sake of simplicity, we have not 
tried to simulate protein levels or enzymatic products 
thereof, i.e., we study gene expression upto the level of 
transcription (mRNA synthesis). Since the number of 
protein molecules and converted products should be pro- 
portional to the mRNA present, no loss of generality is 
introduced by this simplification. The lattice consists of 
52 sites (n=50). Stochastic events are simulated with the 
help of a random number generator. The updating rule 
of our cellular automaton (CA) model is that in each 
time step t the occupation status (0 or 1) of each site 
(except for the O site) at time t — 1 is transferred to the 
nearest-neighbour site towards the right. If the (n-|-2)-th 
, i.e. , the last site is 1 at t — 1, a mRNA is synthesized 



at t and the number of mRNAs increases by 1. In the 
same time step, the configuration ij of OP is determined 
with the probabilities already specified. Thus in each 
time step, the RNAP molecule, if present on the gene, 
moves forward by unit lattice distance (progression of 
transcription) followed by the updating of the OP con- 
figuration. Figure 1 shows the concentration [mRNA] of 
mRNA molecules in the cell as a function of time for 
the parameter values pi = 0.5, P2 = 0.5, ps = 0.3, p4 
= 0.85, P5 = 0.05 and p = 0.4. Note that an almost 
four-fold increase in the probability of RNAP binding is 
assumed due to cooperativity. The stochastic nature of 
the gene expression is evident from the figure with ran- 
dom intervals between the bursts of activity. One also 
notices the presence of several bursts of large size. It is 
important to emphasize that the frequency of transitions 
between high and low expression levels is a function of 
the parameter values chosen and may be low for certain 
parameter values. For the probability values considered, 
the two predominantly favourable states are when the 
gene expression is off (state 1) and when a large amount 
of gene expression takes place (state 2). In the absence 
of RNAP, state 1 has greater weightage but with the 
chance binding of RNAP to the promoter (probability P2 
for this is small), the weight shifts to state 2 until another 
stochastic event terminates cooperative binding and the 
gene reverts to state 1. The probability of obtaining a 
train of N successive transcribing RNAP molecules is 
P2P4~^{^ — Pa)- This is the geometric distribution func- 
tion and the mean and the variance of the distribution 
are given by P2/(l - Pa) and P2{1 + Pa - P2)/(l - Pa)"^ 
respectively. 

For the probability values already specified, the simula- 
tion has been repeated for an ensemble of 3000 cells. For 
each cell, the time evolution is upto 10,000 time steps. 
Figure 2 shows the distribution of the number N(m) 
of cells versus the fraction m of the maximal number 
of niRNA molecules produced after 10,000 time steps. 
Two distinct peaks are seen corresponding to zero and 
maximal gene expression respectively. Such a biniodal 
distribution occurs over a wide range of parameter val- 
ues. Figure 3 shows the distribution for parameter val- 
ues pi=0.7, P2=0.2, p3 = 0.7, p4 = 0.85, P5 =0.05 and 
fx— 0.5. For the same set of parameter values but with 
P3 = 0.1, the bimodal distribution is lost and one gets 
a single prominent peak corresponding to maximal gene 
expression. Distributions with several peaks of random 
heights are obtained when the parameter values do not 
produce the effect. The full parameter space describ- 
ing the three different regions of unimodal, bimodal and 
multi-peak distributions has not been explored in detail 
as yet. The transition from one region to another is in 
a broad sense like a phase transition. Since the distribu- 
tion of transcribing RNAPs is bimodal in nature, many 
results like the distribution of time intervals in between 
bursts of gene expression can be written down from the 
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stochastic theory of such distributions Q . 

In summary, we have proposed a stochastic model 
which can reproduce the bimodal distribution in gene 
expression observed in recent experiments. We have sug- 
gested that the stochastic nature of transcription coupled 
with RNAP binding cooperativity may result in discon- 
tinuous levels of gene expression and consequent bimodal 
distribution of expressed protein levels, as observed in a 
number of experiments. To our knowledge, no stochas- 
tic mechanism of bimodal distribution has been offered 
so far. Increasing emphasis on the stochastic nature of 
the developmental switches operating at the level of tran- 
scription suggests that the bimodal distribution of pro- 
tein levels may have a role to play in such mechanisms 
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FIG. 2. Distribution of no. N(m) of cells expressing frac- 
tion m of maximal number of mRNA after 10,000 time steps. 
The total number of cells is 3000. The parameter values are 
pi = P2=0.5, P3=0.3, p4=0.85, P5=0.Q5 and /i=Q.4. 



Appendix A 

In this Appendix, we discuss the various biological as- 
pects of the problem studied in this paper. 

Biological variability is a product of interaction of 
genes with the environment. With the advent of rapid 
genome sequencing methods and remarkable success in 
sequencing genomes from many organisms, the thrust is 
now gradually shifting to the functional aspects of infor- 
mation present in the genome. The genome of an organ- 
ism is a storehouse of sequential information contained in 
all the genes specific to that organism. Through gene ex- 
pression, the sequential information determines the struc- 
ture of functional molecules like RNAs and proteins. Since 
the advent of molecular biology, the regulation of gene 



expression has been studied in solution or in an ensem- 
ble of cells where an average property is measured. This 
mode of study was necessary, as it was difficult to obtain 
information at the level of an individual cell. A complete 
understanding of cellular processes, however, needs an 
appreciation of events at the level of an individual cell 
and extrapolation to an ensemble of cells. 
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FIG. 3. Distribution of no. N(m) of cells expressing frac- 
tion m of maximal number of mRNA after 10,000 time steps. 
The total number of cells is 3000. The parameter values are 
Pi=0.7, p2=0.2, P3=0.7, p4=0.85, P5=0.05 and ^i=0.5. 

Recent advances have made it possible to study pro- 
cesses within a single cell unmasked by ensemble averag- 
ing 1^ . The simplest event one can study at the individ- 
ual cell level is that of the expression of a reporter gene 
such as lacZ and GFP. In the former case, the end prod- 
uct is an enzyme /? - galactosidase which is capable of 
hydrolyzing a non-coloured substrate to a coloured prod- 
uct. In the latter case the protein itself is fluorescent. 
Hence, the gene expression can be directly studied cither 
colorimetrically or flurometrically at the level of an in- 
dividual cell. Recent experiments using such techniques, 
provide evidence that gene expression occurs in abrupt 
stochastic bursts at the level of an individual cell [||-||. 
The stochastic nature of gene expression is also evident 
when levels of /3-galactosidase were examined in an en- 
semble of cells. Levels of /3-galactosidase are distributed 
in a bimodal manner, in a large fraction of cells the gene 
expression is either off or has a high value |^ . 

Some theories have been proposed so far to explain 
the so-called 'all or none' phenomenon in gene expres- 
sion. These theories are mostly based on an auto- 
catalytic feedback mechanism, synthesis of the gene prod- 
uct gives rise to the transport or production of an activa- 
tor molecule |,|,|. While such processes are certainly 
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possible, the bimodal distribution is a much more general 
phenomenon and has now been found in many types of 
cells, from bacterial to eukaryotic and for different types 
of promoters (^^. 

The two major features of the model of gene expression 
that we have proposed in this paper are stochasticity and 
cooperative binding of RNA polymerase. As already ex- 
plained in the paper, the different physico-chemical pro- 
cesses associated with gene expression are lumped to- 
gether into a few simple events which are random in na- 
ture. To give an example, for many prokaryotic promot- 
ers, there is a two-step reaction scheme in which a RNAP 
open complex is formed preceded by the formation of a 
closed complex. RNAP initiates transcription only from 
the open complex. The isomerization step is rate-limiting 
in many cases [0 . We define the on-rate of RNAP as 
the composite of several steps with the final attainment of 
the open complex. The cooperative binding of RNAP to 
the promoter in our model implies that there is a higher 
probability of binding of RNAP to the promoter in one 
time step if another RNAP molecule is present at the site 
next to the promoter. Although such binding coopera- 
tivity has not been studied in prokaryotic polymerases, 
it has been demonstrated in polio-virus RNA-dependent 
RNA polymerase . Cooperative binding of proteins 
to DNA is now well established. In most cases of reg- 
ulatory proteins, the binding cooperativity is mediated 
through protein-protein interaction although increasing 
evidence of DNA mediated effects are being reported 
| p2[ . In the case of RNAP binding to promoters, how- 
ever, there are now widespread reports of transcription 
generated increase in negative supercoiling with conse- 
quent increase in rate of transcription In many 
promoters, the transcription initiation is sensitive to the 
supercoiling status of the DNA. It has been reported that 
transcription generates increased negative supercoiling 
through several hundred base pairs |15[| . Thus it is en- 
tirely plausible and likely that active transcription down- 
stream of the promoter site may lead to increased binding 
of RNAP and open-complex formation. One can also en- 
visage other mechanisms for generating this kind of coop- 
erativity. For example, if the polymerase-generated nega- 
tive supercoiling (after initial movement) inhibits binding 
of the repressor, it would effectively increase polymerase 
binding probability. 

Transcription is one of the most important events in 
the life-cycle of a cell. The temporal sequence of events 
occurring during transcription is of utmost importance 
in its understanding and has been studied extensively. 



The general description of transcription as well as other 
cellular events have tended to be deterministic in na- 
ture. In the cell there are only a few DNA molecules 
and a few molecules of free RNAP. It is likely that the 
number of molecules in the cell is not high enough so 
that a deterministic description of this small ensemble 
is correct. At the level of a single cell, probabilistic de- 
scriptions are more appropriate. Increasingly, probabilis- 
tic descriptions of cellular events, including transcription 
are being offered [p^p^. 
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